Automatic Generation of Optimized OpenCL Codes Using OCLoptimizer

نویسندگان

  • Jorge F. Fabeiro
  • Diego Andrade
  • Basilio B. Fraguela
  • Ramón Doallo
چکیده

The eruption of multicore processors and several kinds of accelerators has generalized the interest in parallel programming. The OpenCL standard is very appealing because it provides code portability across most of these platforms. It defines a programming model where a host code requests the execution of kernels in computational devices. Unfortunately, the host API of OpenCL is quite verbose, which makes the development of its host code tedious and errorprone. More importantly, OpenCL does not provide automatic performance portability. As a result, users have to hand-tune OpenCL codes for each specific device, which implies trying different versions of the kernels and task partition granularities. As an answer to this situation we present OCLoptimizer, a tool that automatically generates host codes and optimizes OpenCL kernels for each specific target device based on a user provided configuration file. This configuration file describes basic kernel characteristics and annotations in the kernels that indicate the code transformations to test. Our tool can explore different granularities for the problem decomposition as well as different alternatives for the kernel. This exploration is performed by means of an iterative optimization process whose parameters and search strategy are defined by the user specifications. Support for OpenCL codes composed of multiple kernels is also provided by the tool. Experiments performed on multicore CPUs and different accelerators show that the tool is very effective, generating codes with an average speedup of 2.54 with respect to baseline hand-tuned implementations, in single kernel codes, and 1.79 in a code with multiple kernels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OCLoptimizer: An Iterative Optimization Tool for OpenCL

Nowadays, computers include several computational devices with parallel capacities, such as multicore processors and Graphic Processing Units (GPUs). OpenCL enables the programming of all these kinds of devices. An OpenCL program consists of a host code which discovers the computational devices available in the host system and it queues up commands to the devices, and the kernel code which defi...

متن کامل

Evaluating Performance and Portability of OpenCL Programs

Recently, OpenCL, a new open programming standard for GPGPU programming, has become available in addition to CUDA. OpenCL can support various compute devices due to its higher abstraction programming framework. Since there is a semantic gap between OpenCL and compute devices, the OpenCL C compiler plays important roles to exploit the potential of compute devices and therefore its capability sho...

متن کامل

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs

General purpose computing on GPUs (GPGPU) can enable significant performance and energy improvements for certain classes of applications. However, current GPGPU programming models, such as CUDA and OpenCL, are only accessible by systems experts through lowlevel C/C++ APIs. In contrast, large numbers of programmers use highlevel languages, such as Java, due to their productivity advantages of ty...

متن کامل

PIPS Is not (just) Polyhedral Software Adding GPU Code Generation in PIPS

Parallel and heterogeneous computing are growing in audience thanks to the increased performance brought by ubiquitous manycores and GPUs. However, available programming models, like OPENCL or CUDA, are far from being straightforward to use. As a consequence, several automated or semi-automated approaches have been proposed to automatically generate hardware-level codes from high-level sequenti...

متن کامل

Automatic Code-Generation for Large Scale Numerical Models

In this paper we present an overview of current and on-going research on the CTADEL problem-specific code generator. The CTADEL system provides an automated means of generating specific high performance scientific codes, optimized for a number of different architectures. We address problems like implicit equations and a SemiLagrangian method for semi-implicit schemes and show some experiments w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Comput. J.

دوره 58  شماره 

صفحات  -

تاریخ انتشار 2015